7 research outputs found

    A structural and quantitative analysis of the webof linked data and its components to perform retrieval data

    Get PDF
    Esta investigación consiste en un análisis cuantitativo y estructural de la Web of Linked Data con el fin de mejorar la búsqueda de datos en distintas fuentes. Para obtener métricas cuantitativas de la Web of Linked Data, se aplicarán técnicas estadísticas. En el caso del análisis estructural haremos un Análisis de Redes Sociales (ARS). Para tener una idea de la Web of Linked Data para poder hacer un análisis, nos ayudaremos del diagrama de la Linking Open Data (LOD) cloud. Este es un catálogo online de datasets cuya información ha sido publicada usando técnicas de Linked Data. Los datasets son publicados en un lenguaje llamado Resource Description Framework (RDF), el cual crea enlaces entre ellos para que la información pudiera ser reutilizada. El objetivo de obtener un análisis cuantitativo y estructural de la Web of Linked Data es mejorar las búsquedas de datos. Para ese propósito nosotros nos aprovecharemos del uso del lenguaje de marcado Schema.org y del proyecto Linked Open Vocabularies (LOV). Schema.org es un conjunto de etiquetas cuyo objetivo es que los Webmasters pudieran marcar sus propias páginas Web con microdata. El microdata es usado para ayudar a los motores de búsqueda y otras herramientas Web a entender mejor la información que estas contienen. LOV es un catálogo para registrar los vocabularios que usan los datasets de la Web of Linked Data. Su objetivo es proporcionar un acceso sencillo a dichos vocabularios. En la investigación, vamos a desarrollar un estudio para la obtención de datos de la Web of Linked Data usando las fuentes mencionadas anteriormente con técnicas de “ontology matching”. En nuestro caso, primeros vamos a mapear Schema.org con LOV, y después LOV con la Web of Linked Data. Un ARS de LOV también ha sido realizado. El objetivo de dicho análisis es obtener una idea cuantitativa y cualitativa de LOV. Sabiendo esto podemos concluir cosas como: cuales son los vocabularios más usados o si están especializados en algún campo o no. Estos pueden ser usados para filtrar datasets o reutilizar información

    A comparison of neural and non-neural machine learning models for food safety risk prediction with European Union RASFF data.

    Get PDF
    European Union launched the RASFF portal in 1977 to ensure cross-border monitoring and a quick reaction when public health risks are detected in the food chain. There are not enough resources available to guarantee a comprehensive inspection policy, but RASFF data has enormous potential as a preventive tool. However, there are few studies of food and feed risk issues prediction and none with RASFF data. Although deep learning models are good prediction systems, it must be confirmed whether in this field they behave better than other machine learning techniques. The importance of categorical variables encoding as input for numerical models should be specially studied. Results in this paper show that deep learning with entity embedding is the best combination, with accuracies of 86.81%, 82.31%, and 88.94% in each of the three stages of the simplified RASFF process in which the tests were carried out. However, the random forest models with one hot encoding offer only slightly worse results, so it seems that in the quality of the results the coding has more weight than the prediction technique. Our work also demonstrates that the use of probabilistic predictions (an advantage of neural models) can also be used to optimize the number of inspections that can be carried out.pre-print301 K

    A domain categorisation of vocabularies based on a deep learning classifier.

    Get PDF
    The publication of large amounts of open data has become a major trend nowadays. This is a consequence of pro-jects like the Linked Open Data (LOD) community, which publishes and integrates datasets using techniques like Linked Data. Linked Data publishers should follow a set of principles for dataset design. This information is described in a 2011 document that describes tasks as the consideration of reusing vocabularies. With regard to the latter, another project called Linked Open Vocabularies (LOV) attempts to compile the vocabularies used in LOD. These vocabularies have been classified by domain following the subjective criteria of LOV members, which has the inherent risk introducing personal biases. In this paper, we present an automatic classifier of vocabularies based on the main categories of the well-known knowledge source Wikipedia. For this purpose, word-embedding models were used, in combination with Deep Learning techniques. Results show that with a hybrid model of regular Deep Neural Network (DNN), Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN), vocabularies could be classified with an accuracy of 93.57 per cent. Specifically, 36.25 per cent of the vocabularies belong to the Culture category.pre-print304 K

    On the graph structure of the Web of Data

    Get PDF
    This article describes how the Web of Data has emerged as the realization of a machine readable web relying on the resource description framework language as a way to provide richer semantics to datasets. While the web of data is based on similar principles as the original web, being interlinked in the principal mechanism to relate information, the differences in the structure of the information is evident. Several studies have analysed the graph structure of the web, yielding important insights that were used in relevant applications. However, those findings cannot be transposed to the Web of Data, due to fundamental differences in the production, link creation and usage. This article reports on a study of the graph structure of the Web of Data using methods and techniques from similar studies for the Web. Results show that the Web of Data also complies with the theory of the bow-tie. Other characteristics are the low distance between nodes or the closeness and degree centrality are low. Regarding the datasets, the biggest one is Open Data Euskadi but the one with more connections to other datasets is Dbpedia.European Commissio

    Network analysis for food safety: Quantitative and structural study of data gathered through the RASFF system in the European Union.

    Get PDF
    This paper reports a quantitative and structural analysis of data gathered on the food issues reported by the European Union members over the last forty years. The study applies statistical measures and network analysis techniques. For this purpose, a graph was constructed of how different contaminated products have been distributed through countries. The work aims to leverage insights into the structure formed by the involvement of European countries in the exchange of goods that can cause problems for populations. The results obtained show the roles of different countries in the detection of sensitive routes. In particular, the analysis identifies problematic origin countries, such as China or Turkey, whereas European countries, in general, do have good border control policies for the import/export of food.pre-print1210 K

    BERT Learns From Electroencephalograms About Parkinson's Disease: Transformer-Based Models for Aid Diagnosis.

    Get PDF
    Medicine is a complex field with highly trained specialists with extensive knowledge that continuously needs updating. Among them all, those who study the brain can perform complex tasks due to the structure of this organ. There are neurological diseases such as degenerative ones whose diagnoses are essential in very early stages. Parkinson’s disease is one of them, usually having a confirmed diagnosis when it is already very developed. Some physicians have proposed using electroencephalograms as a non-invasive method for a prompt diagnosis. The problem with these tests is that data analysis relies on the clinical eye of a very experienced professional, which entails situations that escape human perception. This research proposes the use of deep learning techniques in combination with electroencephalograms to develop a non-invasive method for Parkinson’s disease diagnosis. These models have demonstrated their good performance in managing massive amounts of data. Our main contribution is to apply models from the field of Natural Language Processing, particularly an adaptation of BERT models, for being the last milestone in the area. This model choice is due to the similarity between texts and electroencephalograms that can be processed as data sequences. Results show that the best model uses electroencephalograms of 64 channels from people without resting states and finger-tapping tasks. In terms of metrics, the model has values around 86%.post-print1134 K
    corecore